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1. INTRODUCTION 

One of the most blooming research area in computer science nowadays undoubtedly is computer 
vision. With a vision of enabling computer based machines and robots to see and understand the surrounding 
world. It is to be expected that this technology can produce result based on static and dynamic scene. Of 
course the static scene application is the basic for the dynamic one. Image recognition is the early form of 
this technology application. To recognize an image, system would classify it into a specific domain [1]. Thus 
this system also known as a classification system. 

From solving image classifying problem the application of computer vision spread into many. It can 
be used to recognize handwriten letter, access control, camera surveilance, human detection, human tracking, 
distinguish textile products, classifying animal, and even for military purposes [2-5]. 

K-nearest neighbor (k-NN), support vector machine (SVM) and machine learning (ML) is an 
example technique popular to solve image recognition problem. The simplest one of the three is k-NN, which 
do the classification by searching the most similar image from the dataset [6-11]. SVM basically try to 
projecting input data to feature space. Thus result in linear classifier of the data. The application of SVM in 
image recognition field include handwriten recognition and satelite image analyze. In machine learning or 
any recognition and classification application feature extraction play important role [12]. 

Arguably machine learning is the leading technology in image recognition righ now, with the deep 
learning method be the most successfull of its subset [13]. Deep learning (DL) is a neural network with 
multiple hidden layers. Multi-layer perceptron (MLP) is the traditional type of DL in which every element of 
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the previous layer is connected to every element of the next layer. In image recognition field there is three 
deep learning techniques notably superior, that is convolutional neural network (CNN), restricted boltzmann 
machine (RBM) based model, and stacked denoising autoencoders (SDA). With CNN alone as supervised 
type while the RBM and SDA use unsupervised approach. CNN has eminence in automatic feature learning, 
proof to scale/rotation/translation, and in generalization to avoid overfitting. 

The growth of deep learning technology in recent year is boosted by three factors: 1. the availability 
of plentiful dataset that can be accessed publicly, 2. the rapid increase in GPU-based computing power, 3. the 
rise of new machine learning platform such as tensorflow, keras, theano that released in open source manner 
[9]. For the datasets, there is various source online that provided open data that can be downloaded freely. 
One is from Univercity of California Irvine (UCI) that provided a machine learning repository loaded with a 
various amount of dataset. Since the year of 2007 to 2019 this repository held 494 datasets and still growing. 
Beside that, Google the internet giant recently released new service called datasetsearch, a searching tool to 
25 million datasets. Those tremendous amount of freely accessible datasets combining with freely available 
powerfull tool and cheaper computing power result in boost of machine learning research. 

Recognizing object via image is an easy task for human, but for decades became a daunting task for 
machine or computer. Thanks to CNN, now training a computer to classify an image into several category 
became an easier task. The popular image recognizing problem raise from several open datasets such as 
imagenet, MNIST, CIFAR, Pascal. Take imagenet challenge as an example, the CNN technique proved 
successfull to achieve 95% accuracy. 

More challenging problem is to recognise fine grained image. That is image of objects that seems 
similar for different classes. Like to recognize the dog breeds from image, or bird or cat breeds. This of 
course more difficult to do then to classify image into horse and bird. An experiment using GoogleNet model 
trained on ImageNet dataset then retrained on fine-grained fashion dataset yield 62% accuracy [14]. Thus the 
fine-grained image recognition remain challenging. 

To identify between batik and its imitation was an image recognition problem left slightly explored. 
Batik is a traditional textile of Indonesia produced by hot wax resist dyeing technigue. There is two ways to 
put hot wax on a fabric in batik making process. The first is through the use canting tulis, the second is 
through the use of canting cap. Both canting tulis and canting cap are traditional tools for making batik. Batik 
that done wholly through the use of canting tulis then called batik tulis or handwritten batik, and batik that 
done wholly through the use of canting cap the called batik cap or stamped batik. Batik that combine both 
tools and technigue then called batik kombinasi or combination batik. 

Handwritten batik, stamped batik and combination batik are the three known as truly batik. The 
process of making batik is as follows. First using pencil a pattern were drawn on to the fabric. This job can 
take days for complex pattern. Then the work continue to wax sticking job. For handwritten batik this stage is 
done by using canting tulis and for stamped batik this is done by the use of canting cap. This job may take 
longer period than the pattern drawing job especially for handwritten batik. After all the pattern covered in 
wax then the job is move to the first coloring work. If multiple color are desired in a piece of fabric then the 
first colored area must be sealed by wax then the following coloring job can occur. To detach the wax the 
boiling process is required. The wax that stick to the fabric will be removed while the fabric is boiled in a 
boiling water. Any batik likes product which produced involving other technique considered as imitation 
batik. Imitation batik can be made faster and cheaper then the truly batik. Imitation batik making process 
include color printing, color removal, cold wax printing and the combination. The imitation is indeed can be 
made very similar to the original. Thus result in cheaper product fooling the ordinary people. 

The Center for Crafts and Batik in Yogyakarta, Indonesia has long fighting the imitation batik 
product. Now we want to try machine learning to help in identifying batik and its imitation. Identifying batik 
and its imitation is a hard task. Even a batik expert has difficulty in identifying a good imitation from the real 
one. That because nowadays the batik forgery technique is so advance its produce a very good imitation. 
There is traits that can be use to differentiate between batik and its imitation, but those were very subtle and 
encompass not only visual aspect. A batik evaluator whenever identifying batik products must consider these 
traits and then their instinct to deduces wether the product is genuine or fake. Usually they do it in group 
which then they could disscuss their conclusion that may differ from each other. It is common for batik 
evaluators to deduces differently from each other. It would not be excessive if we say that the batik and non 
batik identification problem was go beyond fine grained image recognition problem. For a fine grained image 
recognition problem the image itself was sufficient for an expert to do the classification manually. But for the 
batik and non batik problem, even an expert cant draw a coherent conclussion from a real object not to 
mention drawing a conclusion from an image only. 
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2. RESEARCH METHOD 
2.1. Related research 

Classification of batik pattern have been done before [15-17]. In batik classification, the CNN 
method proved to be the best choice [15-16]. However all the work before focused on identifying the batik 
motif or the shape of the batik pattern. Research on how to identify the authenticity of batik through the use 
of machine learning has not yet been found. However, ways to identify batik authenticity can be done by 
manually observing visual, physical, and chemical traits as stated by Masiswo et al. [18]. This research tries 
to found an automated solution in identifying batik authenticity. Identifying batik authenticity is more 
difficult than classifying batik pattern. A task that even the expert cannot do easily. 


2.2. Batik expert group discussion 
Preparing the dataset is the busiest part of the research stages. A ready to use labeled dataset for 
classification between batik and non batik was not available yet, so it have to be compiled first. At the 
beginning we realize that this project will need particular images. We have a doubt for a random image taken 
from batik-non batik sample would do for classifying. Thus a batik expert group discussion was held to 
decide which kind of batik-non batik image would work for classifying. 
The result of batik expert group discussion is: 
— An image of batik-non batik sample taken with a digital microscope with 60 times magnification 
resulted in image type | depicted in Figure 1. 
— An image of batik-non batik sample taken with a digital microscope with 20 times magnification 
resulted in image type 2 depicted in Figure 2. 
— An image of batik-non batik sample taken with regular smartphone camera with default setting from 
25cm distance more or less. 
The experts says that in the first and second type of image the wax trait could be seems. Wax trait especially 
from the first wax sticking job is important to identify the authenticity of a batik products. The type 3 images 
are designed for mobile applications should a digital microscope is not available. 


Figure 1. Sample of image type | for non-batik class Figure 2. Sample of image type 2 for batik class 


2.3. Gathering samples 
Samples were divided into two big classes and five smaller classes. The two big classes are: 
(a) batik, (b) non batik. The five smaller classes are: 
— Batik tulis (BT), 
— Batik cap (BC), which also fall in batik class, 
— Print warna (PW), 
—  Cabut warna (CW), 
— Print malam dingin (PM), which also fall in non batik class. 
Samples were gathered in various form, some sheets of textile some clothes. To decide in which class the 
sample supose to join, an examination by group of batik experts was done. The sample that has been through 
the examination step then put to join the suggested class, ready for image capture in the next step. 


2.4. Image capture 
The first type image, the 60 times magnification image, and the second type image, the 20 times 
magnification image were taken with a digital microscope. The images were taken in a room with enough 
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daylight sources. The digital microscope also have its own lightning located at the tip near its lense. The 
microscope lense lightning setting was set so the image doesnt washed out by the bright light comes from its 
lamp. The microscope setting are arranged so the output image will be focused and sharp. With every detail 
captured as much as posible. The third type image were taken with regular smartphone camera with default 
setting. We use an external light source to help take pictures of the third type. The image files then divided 
and labeled into classes, organized in folders. 


2.5. Inception 

As the grow in CNN became a leader in computer vision algorithm, thus many model of CNN 
emerge. One of those model is the Inception model. In ImageNet datasets, which hold million images for 
1000 classes, inception models was trained and score 3,5% on error rate [19]. InceptionV3 is based on its 
predecessor, inception V1 and Inception V2 with a modification in the initial structure. The architecture of 
InceptionV3 was accomodated in the GoogleNet model which in 2014 was recognized as the state of the art 
in image recognition. The basic idea is instead deciding the size of convolution to use, just do all the 
convolutions and let the model decide which is best. This method allow model to find local feature and more 
abstracted feature both by utilizing small convolution and big convolution along the way. 


2.6. Mobilenet 

Mobilenet is a streamlined architecture that uses depthwise separable convolutions. The result of 
this architecture is a lightweigth CNN model that is efficient for mobile and embedded device application [1], 
[20-22]. Mobilenet uses 3x3 depthwise separable convolution reducing the computation to 8to 9 times less 
than the standard convolutions with the price of only small reduction in accuracy. Surprisingly in a fine 
grained recognition problem like the stanford dog dataset the tiny mobilenet model gain slightly less 
accuracy at greatly reduced computation and size compared to the state of the art result (83,3%:84%). 


2.7. Retraining the model 

Transfer learning is a new method in machine learning by accomodating the learned knowledge in a 
training then use that to solve new and different but related to the old problem problems. In comparison in 
real life we can apply the knowledge we gather in making device with a microcontroller then apply that to 
build similar device based on single board computer, of course some adaptation required. In machine 
learning, we can use a model trained in a bigger and complex dataset problem then use the model and trained 
a bit to solve simpler dataset. Thats what transfer learning do. 

The transfer learning process is: 

— We train a models with a big and complex dataset, 

— Keep the model and change the last layer (the output layer) with the desired output layer of the new 
dataset, 

—  Retrain the model with the new dataset by modifying only the connection between the last layer and the 
previous. 

Thus the retraining process will just modify the weight parameters of the last layer connecting to the output 

defined by the label vector [19, 23-25]. After that we have new model with the output layer matched the 

classes of the new dataset and can be used for solving the new problem. 

Luckily enough nowadays we can obtain the pretrained models from online repository. TensorFlow 
a new machine learning platform released by Google, that run under python language, make available to a 
number of pretrained models on their website. We can obtain a pretrained inception, mobilenet, VGG models 
and more ready to use in our tensorflow environment. For this research we use a ready to use inceptionV3 
and mobilenetV2 models pretrained with imageNet dataset. By utilizing a pretrained models we no longer 
require to train the models first, we can skip the first step in transfer learning process. 

This research in general was aimed in identification of batik and non batik products in more detailed 
task the identification is narrowed to classify into five smaller classes mentioned before in part 2.1. For both 
problem we will use two models, inceptionV3 and mobilenetV2. For there are three types of image used in 
classifying then there will be 12 models trained. 

The environment we set up for the retraining process is a tensorflow 1.12 with python 3.6, run on a 
build up PC with Intel core i7 processor and 16GB memory. No GPU accelerated process were used. 


3. RESULTS AND DISCUSSION 
3.1. The expert judgement 

From the data gathering step, we success in assemble about 1000 sample and taking about 
12000 picture of those samples. The recapitulation of this dataset sample and picture gathering is as shown in 
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Table 1. From the expert group discussion step about deciding which part of the sample is best for identifying 
batik and non batik, the expert conclude there is two strong visual trait to identify batik authenticity. Those 
two trait unfortunately is a little bit abstract. When the expert try to explain those trait to the commoner in the 
team, we the commoner can’t grasp the concept they try to explain. After the long explanation and question- 
answer session we still cant apply the thing they said to differentiate between classes in this batik non batik 
problem. In the other hand the expert seems capable in identifying the batik and non batik products even into 
the five category which is harder to do. Figures 3-4 show images that the expert claims shows one of the two 
visual trait. For us commoner those two image were similar in the term that we cannot conclude which one is 
batik and which is non batik. Fortunately most of the expert we test with the image can answer correctly. 
Some expert got it wrong by answering print malam instead of the correct answer batik cap. The print malam 
imitation is the one batik imitation considerably difficult to identify because the technique result in similar to 
authentic batik product. All of the expert agree that solely depend on visual trait to identify the authenticity of 
a batik product lead to uncertain conclusion. Even by using all the physical and visual trait an expert 
sometime can’t be sure of the batik authenticity. A standard labeling process of authentic batik product 
require the evaluators to examine the production process to make sure that the factory is truly produce 
authentic batik products. 


Figure 3. batik cap image 60x magnification Figure 4. cabut warna imitation image 60x 
magnification 


Table 1. Dataset recapitulation 


Sample 60x 20x frame total 

Type image sample image sample image sample image sample 
CW 591 119 590 118 238 119 1419 356 
BC 1351 271 1351 271 542 271 3244 813 
PM 442 89 440 88 175 88 1057 265 
PW 1491 299 1491 299 598 299 3580 897 
BT 1266 254 1265 253 506 253 3037 760 


3.2. Result expected 

The difference result given by the expert judgement shows that there is no certain method in 
identifying the batik authenticity. The recent best method, the expert judgement still leave errors. Therefore 
we not put the 100% accuracy as our aim. Instead we put the expert judgement accuracy as our target. The 
result model will be expected to have the expert accuracy. Thus we label the expert judgement accuracy as 
the 100% accuracy target we want to achieve. 


3.3. Expert suggestion 

Because there is no well written procedure to manually identifying the batik authenticity that can be 
follow naratively, and because the expert cannot clearly explain the method they use in finding trait, then we 
conclude that those trait were abstract in some level. We move then to just follow the expert suggestion that 
in a certain picture of sample there is trait that can be seems, even if some expertise is needed to notice those 
trait. We took the picture of those part and then use it to train a CNN model to solve the authentic batik 
identification problem. The expert suggest that in a picture of sample taken with a digital microscope with 
60x magnification and with 20x magnification there is trait to be found. One of the expert said that those trait 
were the main trait to spot the print malam product, the hardest to identify imitation. Beside the 60x and 20x 
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magnification we also include the regular photo from the smartphone camera, we call this as the full frame 
image. This third type image we try to also identify through CNN retraining process. For a sample we took 5 
picture of 60x magnification, 5 picture of 20x magnification and 2x of full frame image, total 12 pictures per 
sample. 


3.4. Models accuracies 

The models retraining process or the transfer learning was held on a pretrained models. The 
inception V3 and mobilenetV2 models that has been trained with the imageNet dataset. Those two models we 
download the copy from the tensorflow website. For batik and non batik problem or the two class problem 
we try both the models, not different from what we do for the five class problem. First we train an 
inceptionV3 models on dataset of 60x magnification image for two class problem. The result is 74.9% 
accuracy. Then we retrain another copy with 20x magnification image dataset for two class problem that 
resulting in 75.9% accuracy. Then for the full frame dataset of two class problem the inceptionV3 models 
achieve 70.6% accuracy. The full model accuracy result is shown in Table 2. The average time needed for 
full retraining a models with the prepared datasets is less than one hour. 


Table 2. Models accuracy 


Models Accuracy (%) 
mobilenet bnb 20 62.8 
mobilenet bnb 60 77.8 

mobilenet bnb frame 72.7 
mobilenet all 20 57.5 
mobilenet all 60 59 

mobilenet all frame 56.8 
inceptionV3 bnb 20 19 
inceptionV3 bnb 60 74.9 
inceptionV3 bnb frame 70.6 
inceptionV3 all 20 59.9 
inceptionV3 all 60 62.1 
inceptionV3 all frame 56.8 


Overall the result is as expected. The inceptionV3 models gain higher accuracy than the 
mobilenetV2 models. The mobilenet models are designed to run in a mobile environment, the minimization 
of model size and time to run the model is a priority of mobilenet design. Through simplification the 
mobilenet models end to smaller size, simpler calculation, and faster run time in the price of acceptable 
accuracy. The mobilenetV2 model we train approximately 20MB in size. Four time smaller than 80MB of 
inceptionV3 models. 

The models identifying in two class problem is also have better performance than the models 
identifying the five class problem. The two class problem is clearly the simpler task so the higher accuracy 
were expected. 


4. CONCLUSION 

The hard to identify object is a new sector in image recognition world. Taking a step further from 
fine grained recognition problem, a hard to identify object problem is trying to solve the identification of an 
object that even an expert had difficulties in identifying the object. Take an example in identification of batik 
and non batik problem. An expert in batik could try to identify the batik authenticity with the help of 
Indonesian national standard of batik (SNI Batik) as a guidance or by the abstract knowledge and experience 
they have as a long time dedicated experts. But the identification result may vary among the experts itself. 
Thus the identification batik and its imitation problem is a hard to identify object problem because even an 
expert will have difficulties in doing the task. We try to solve this problem with the help of CNN. We retrain 
inceptionV3 and mobilenetV2 models formerly pretrained on imagenet data set. Overall the inceptionV3 
based models perform better than the mobilenetV2 based models, models of fewer class categories have 
better accuracy, and models trained with the type of image suggested by the expert also gain better accuracy 
than models trained with a randomly taken image. 
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